Proteotranscriptomics of ocular adnexal B-cell lymphoma reveals an oncogenic role of alternative splicing and identifies a diagnostic marker

Background Ocular adnexal B-cell lymphoma (OABL) is a rare subtype of non-Hodgkin lymphoma. The molecular characteristics of OABL remain poorly understood. We performed an integrated study to investigate the proteotranscriptome landscape and identify novel molecular characteristics and biomarkers of OABL. Methods Integrated quantitative proteome and transcriptome were performed on 40 OABL 12 idiopathic orbital inflammation, 6 reactive lymphoid hyperplasia, and 13 aesthetic orbital plastic surgery specimens. Complete clinicopathologic and prognostic data of the patients were recorded. Results We identified high global protein-mRNA concordance as a novel characteristic of OABL. High concordance was related to OABL recurrence. By integrated expression profile, motif enrichment and trend analysis, we found that alternative splicing is inflammation-independently dysregulated in OABL. After portraying the aberrant alternative splicing event landscape, we demonstrated the oncogenic role of ADAR, a core splicing regulator that regulates the splicing of Rho GTPase and cell cycle members. We found that ADAR regulates cell proliferation and Rho GTPase inhibitor sensitivity of lymphoma. We identified DNAJC9 as a potential biomarker for OABL in proteomic analyses. Immunohistochemistry and immunofluorescent staining showed the nuclear staining of DNAJC9 was significantly higher in extranodal marginal zone lymphomas compared with inflammation specimens. Conclusions These results provide an integrated gene expression profiling and demonstrate that high global protein-mRNA concordance is a prognosis-related molecular characteristic of OABL. We portray the alternative splicing events landscape of OABL, and reveal the oncogenic role of ADAR. We identified strong nuclear staining of DNAJC9 as a promising pathology diagnostic biomarker for extranodal marginal zone lymphomas. Supplementary Information The online version contains supplementary material available at 10.1186/s13046-022-02445-8.

While gene expression profiling has led to landmark discoveries of NHLs, few studies have examined ocular adnexal B-cell lymphomas (OABLs) [7][8][9]. Furthermore, defining the biology of NHLs solely based on the transcriptome is challenging. By combining proteomic and transcriptomic data, proteotranscriptome-based studies have revealed novel insights into the development and progression of malignancies [10,11], with findings that cannot be revealed by mRNA-based studies. By investigating mass spectrometry (MS)-based TMT labeling quantitative proteome and transcriptome, we provided an integrated gene expression landscape of OABL, revealed the global protein-mRNA concordance as a novel prognostic-related disease characteristic, and identified a novel pathology diagnostic marker.
Our analysis also revealed the importance of the alternative splicing pathway in OABL. It is a posttranscriptional gene regulation approach, which contributes to protein diversity [12]. Dysregulation of alternative splicing has been shown to contribute to the development and progression of various types of malignancy [13]. While some studies have identified mutation of SFs in mantle cell lymphomas (MCLs), alternative splicing in NHLs has not been well studied [14]. We provided a landscape of alternative splicing events (ASEs) as well as their potential biological implication in OABL and further demonstrated the oncogenic nature of the splicing regulator ADAR in OABL.

Patient selection and ethical approval
We reviewed our medical records database to identify patients confirmed by surgical biopsy at the Department of Ophthalmology, Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine from January 2016 to February 2020. The inclusion criteria were as follows: (1) diagnosis of histologically confirmed OABL, idiopathic orbital inflammation (IOI), reactive lymphoid hyperplasia (RLH), and patients who underwent orbital plastic surgery for aesthetic reasons; (2) availability of clinical and laboratory information at the time of diagnosis; and (3) specimen storage at − 80 °C. Clinical data were obtained from medical records. IOI, RLH, and normal specimens were defined as controls. IOIs and RLHs were defined as the "inflammation" in subgroup analysis (Supplementary Table S1).
The study protocol was approved by the institutional review board of Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine (protocol SH9H-2019-T185-2). Informed consent was obtained from all patients enrolled in the proteomic and transcriptomic analysis. The clinical characteristics of these patients (40 OABL patients and 31 controls) are summarized in Supplementary Table S2.

Protein sample preparation and sequencing
Proteomic and transcriptomic data were generated from 71 samples from the above-mentioned patients. The pathological sections were reviewed by three pathologists to validate the diagnosis before sequencing. All specimens were stored at − 80 °C until protein and RNA isolation, and sequencing was performed by Beijing Cap-italBio Technology Inc.
The experimental steps are described in the Supplementary Methods. Briefly, specimens were lysed using protein extraction buffer (8 M urea,0.1% SDS) containing 1 mM phenylmethylsulfonyl fluoride (Beyotime Biotechnology, China) and protease inhibitor cocktail (Roche, USA). Tandem mass tags TMTpro (Pierce, USA) with different reporter ions (126-131 Da) were applied as isobaric tags for relative quantification and TMT labeling was performed following the manufacturer's instructions. The MS analysis was conducted using an Q Exactive mass spectrometer (Thermo Scientific, USA). Proteome discoverer software (version 1.4) (Thermo Scientific, USA) was used to perform database searching against the RefSeq database. The results were filtered using the following settings: high confident peptides with a global FDR < 1% based on a target-decoy approach. The proteomic data have been uploaded into the iProX database (https:// www. iprox. org); (project ID IPX0004253000).

RNA sample preparation and sequencing
RNA samples were prepared using TRIzol reagent (Ambion, 15,596-026) following the manufacturer's protocol. The poly-A containing mRNA molecules were purified from RNA using poly-T oligo-attached magnetic beads. The fragments were reversely transcribed into first strand cDNA using random hexamers, following by second strand cDNA synthesis using DNA polymerase I and RNase H. PCR was used to selectively enrich DNA fragments with adapter molecules on both ends and to amplify the amount of DNA in the library. The library was qualified using the Agilent 2100 bioanalyzer and quantified by Qubit and qPCR. The produced libraries were sequenced on the illumina Novaseq 6000 platform. Reads were aligned to hg38. The RNA-seq data have been deposited in the Gene Expression Omnibus database (https:// www. ncbi. nlm. nih. gov/ geo) under accession numbers GSE171059 and GSE199517.

AASE identification and analysis
The AASEs in OABLs were identified using rMATS [15]. All the sequences and annotations used in this analysis were based on GRCh38 genome assembly. An ASE with a ΔInclevel value between the OABLs and controls of more than 5% (|ΔInclevel | < 0.05) and adj p-value of < 0.01 was identified as an AASE. The list of annotated splicing factors and regulators was downloaded from the SpliceAid-F database and the study by Nostrand et al. [16,17] (Supplementary Table S9).

Differential expression analysis
For transcriptomic data, low-abundance transcripts were removed. For proteomic data, low-abundance proteins and proteins missing in > 20% patients were removed, and K-nearest neighbor imputation method was used to complete values in proteomic data for proteins missing in < 20% patients. Differential expression analysis was performed using limma R package (version: 3.46.0) [18] after normalization. We set |log2(foldchange)| > log2(1.5) and p-value < 0.05 as the threshold for transcriptome data, and |log2(foldchange)| > log2(1.2) and p-value < 0.05 as the threshold for proteome data.

Cells and cell culture
NHL cell lines were provided by Cell Bank and Stem Cell Bank, Chinese Academy of Sciences. The Raji and SU-DHL-4 lymphoma cell lines were cultured in RPMI1640 (Invitrogen, CA) supplemented with 10% fetal bovine serum (Gibco) and 1% penicillin/streptomycin (Gibco) and maintained at 37 °C in a 5% CO 2 humidified atmosphere.

Virus transduction and generation of stable cell lines
Two individual lentiviral vectors containing shR-NAs targeting human ADAR (pLKO .1-shADAR#1:  CCG GCG GAT ACT ACA CCC ATC CAT TCT CGA  G A A TG G ATG G G T G TA G TA TC C G T T T T TG ;   pLKO.1-shADAR#2:  CCG GGC CCA CTG TTA TCT  TCA CTT TCT CGA GAA AGT GAA GAT AAC AGT GGG  CTT TTTG) were purchased from Shanghai Genomditech (Shanghai, China). Non-targeting shRNA was used as the control. The lentiviral vectors and packaging vectors were transfected into the 293 T packaging cell line using the PolyJet In Vitro DNA Transfection Reagent (SignaGen). Retroviral vectors were transfected. Targeted cells were infected with lentivirus in the presence of 8 μg/ml polybrene (Sigma).

Western blot
Cells were lysed in mammalian protein extraction reagent (Pierce). After protein quantification using a bicinchoninic acid protein assay kit (Pierce), 60 μg of total protein was separated by 10% SDS-PAGE under denaturing conditions and transferred to PVDF membranes (Millipore). Membranes were blocked in 5% non-fat milk (Bio-Rad) and then incubated with primary antibodies, followed by incubation with a secondary antibody conjugated with horseradish peroxidase (1:10,000; Amersham Biosciences). Immunoreactive proteins were visualized using the LumiGLO chemiluminescent substrate (Cell Signaling Technology). The primary antibodies used in this study are as follows: b-Actin (1:10,000; Sigma); and ADAR1 (1:1000; Cell Signaling Technology).

Cell proliferation assay
To assess cell proliferation, a Cell Counting Kit-8 (CCK-8, New Cell & Molecular Biotech, China) was used following the manufacturer's instructions. Cells were seeded into a 96-well plate at 1 × 10 3 cells per well with 100 μl medium and cultured at 37 °C with 5% CO 2 . CCK8 solution was added (10 μl per well) and cells were incubated for 3 h before measuring the absorbance at 450 nm.

Immunohistochemistry (IHC) and immunofluorescent (IF) staining
IHC and IF were performed following standard procedures. Five-micron thick formalin-fixed paraffin embedded (FFPE) human tissue sections were used for the experiments. Stained slides were digitized using the Pannoramic DESK (3D HISTECH) with a 40× objective lens.

Definitions
Local recurrence was defined as lymphoma recurrence at the orbital region. Distant recurrence was defined as lymphoma recurrence at an extra orbital site that was not initially involved. Progression-free survival (PFS) was calculated from the date of diagnosis to recurrence, progression, death, or the most recent follow-up. Overall survival (OS) was defined as the time from initial disease diagnosis to death by any cause or until the most recent follow-up. Recurrence-free survival (RFS), local recurrence-free survival (LRFS), and distant recurrence-free survival (DRFS) were calculated from the date of initial treatment to the corresponding recurrence, or until the most recent follow-up.

Statistical analysis
All statistical tests were two-sided and P-values < 0.05 were considered statistically significant. Statistical analyses were performed using the R software developed by the R Development Core Team at R Bioconductor [23] and GraphPad Prism (https://www.graphpad.com/scient ificsoftware/prism/).
For comparison of continuous variables, we used the Mann-Whitney U test for correlation results as previously mentioned [11], and the Student's t-test for others. For correlation analysis, we applied the Spearman rank correlation test for global protein-mRNA correlation following previous observations [24], and Pearson's correlation test for other continuous variables. Survival curves were generated using the Kaplan-Meier method and compared by log-rank test. Cox proportional hazards models were constructed to identify prognostic factors for OABL. Adjusted hazard ratios with 95% confidence intervals were calculated. Logistic regression models and lasso regression models were constructed to identify diagnostic markers for OABL.

Integrated proteotranscriptomic landscape of OABL
To obtain a comprehensive view of the molecular characteristics of OABLs, we performed an integrated proteomic and transcriptomic analysis of 40 OABLs, including 28 EMZLs, 8 DLBCLs, 2 MCLs, and 2 small lymphocytic lymphomas (SLLs), and 31 control specimens including 12 IOIs, 6 RLHs, and 13 normal orbital tissues (Supplementary Table S1, Fig. 1A). Because of the contamination of plasma proteins, TMT labeled liquid chromatography-mass spectrometry was not performed in one EMZL, three IOI, and two normal samples. Therefore, after a quality control process, we obtained proteomic data of 39 OABLs and 26 controls (Supplementary Table S2). Because of RNA degradation, RNA-seq was not performed in 7 EMZLs, 1 DLBCL, 1 SLL, 4 IOIs, 3 RLHs, and 7 normal tissues. After quality control, we obtained transcriptomic data of 31 OABLs and 17 controls (Supplementary Table S3).
By performing hierarchical clustering of highly variable genes (HVGs) and samples in transcriptomic and proteomic data, we found that most of the OABLs and controls were divided into two groups. While OABL and normal specimens were clearly divided, some of the inflammation samples, especially RLH samples, were misclassified into the OABL group ( Supplementary Fig.  S1). These results validated our sampling and analysis processes and echoed the observed association between inflammation and lymphoma [25][26][27][28].
After removal of low-abundance transcripts/proteins and completion of missing values, we mapped transcripts and proteins to Ensembl IDs and detected 3639 protein-mRNA pairs in these samples. For these pairs, we calculated differentially expressed genes (DEGs) between OABLs and controls ( Fig. 1B, Supplementary Table S4). The results showed that 787 genes were upregulated and 542 genes were downregulated in the proteome of OABLs compared with controls. Eight hundred six genes were upregulated and 445 genes were downregulated in the transcriptome of OABLs compared with controls. After matching results of omics data, we found that 451 genes were concordantly upregulated (CO-UP), 386 genes were concordantly downregulated (CO-DOWN), and 19 genes were discordantly dysregulated. The CO-UP DEGs were mainly enriched in immune-related, GTPase signaling, negative regulation of phosphate metabolic process and DNA recombination terms. The CO-DOWN DEGs were mainly enriched in normal tissue development and organization, monocarboxylic acid, and arginine and proline metabolism terms (Fig. 1C).
To avoid reported potential systematic biases [10], we further performed GSEA to identify overlapping dysregulated gene sets in OABLs. A total of 1023 gene sets were commonly identified by proteomic and transcriptomic data (Supplementary Table S5). Among these, 763 gene sets were significantly dysregulated in at least one type of omics (FDR < 0.2), and 725 were concordantly dysregulated and 38 were discordantly regulated (Fig. 1D). Arranged by the sum of NES rank in each omic, CO-UP gene sets were mainly enriched in mRNA processing and splicing, DNA damage and repair, and protein sumoylation. CO-DOWN gene sets were mainly enriched in normal tissue development and organization, and aerobic glucose metabolism (Fig. 1E). As our study contained multiple subtypes of OABL, we performed GSVA in the proteomic data and examined the robustness of the dysregulations ( Supplementary Fig. S2A, Supplementary Table S6, Supplementary Methods). The variations of these top-ranked genesets were consistent among all four subtypes. For discordantly dysregulated genes and pathways, immune, Golgi traffic and amide metabolism related gene sets were identified by DEG enrichment and GSEA analyses ( Supplementary Fig. S2B, C).

Global protein-mRNA concordance is a distant recurrence-related characteristic of OABL
Next, we examined the relationship between protein and mRNA abundance and its association with disease characteristics. Global protein-mRNA concordance was computed as the Spearman correlation result of all paired protein and mRNA abundance in each patient ( Fig. 2A). We analyzed this concordance in patients with both proteomic and transcriptomic data (Fig. 1A). Considering the different transcript/protein abundance distribution between OABLs and controls, we analyzed protein-mRNA pairs separately in these two groups. We identified 3818 protein-mRNA pairs in OABLs and 3728 pairs in controls. The concordance was significantly higher in OABLs (median rho = 0.364) than in controls (median rho = 0.208, p = 0.01, Fig. 2B-C). Considering the association between inflammation and lymphoma [25][26][27][28], we compared the global concordance across subgroup specimens. The results showed that OABLs exhibited a relatively higher concordance than other groups (median rho of RLH = 0.254, IOI = 0.23, normal = 0.16). These data indicated that the increased correlation between protein and mRNA abundance is a disease-associated characteristic of OABL.
We next analyzed if the concordance is associated with disease aggressiveness. First, we compared the concordance across subtypes, Ann Arbor stage, and prognostic risk factors (Fig. 2D). The concordance within no-EMZL subtypes was significantly higher than that of EMZL (p = 0.039), an indolent lymphoma subtype. High concordance was significantly associated with higher LDH (p = 0.014) and IPI (p = 0.046), two prognostic risk factors of NHL. High concordance was relatively associated with a higher Ann Arbor stage (p = 0.054) ( Supplementary Fig. S3A). Next, we evaluated the correlation between proliferation ability and global concordance (Fig. 2E). The result showed a strong positive association between Ki67 protein abundance and global concordance in OABLs (r = 0.495, p = 0.005), but this association was not present in the controls (r = 0.203, p = 0.527). Hence, higher global concordance was associated with disease aggressiveness solely in OABL.
We then tested the correlation between the protein-mRNA concordance and OABL prognosis. We divided OABL patients into two groups using the median value of concordance and compared PFS, OS, RFS, LRFS, and DRFS between the two groups ( Fig. 2G, Supplementary  Fig. S3B). Among the 15 patients in the high rho group, 6 showed recurrence, 1 showed local recurrence, and 6 showed distant recurrence. For the 15 patients in the low rho group, 2 showed recurrence, 1 showed local recurrence, and 1 showed distant recurrence (Supplementary Table S1). Survival analysis revealed that a globally increased concordance in OABL was significantly associated with reduced DRFS (p = 0.037) and relatively associated with reduced RFS (p = 0.083), but not with PFS (p = 0.19) or OS (p = 0.26).
We then compared the global concordance between patients with and without the recurrence events (Fig. 2F,  Supplementary Fig. S3C). High concordance was significantly associated with distant recurrence events (p = 0.0034) and recurrence events (p = 0.0072), but not with local recurrence events (p = 0.41). We analyzed the relationship between the global concordance and recurrence in small B-cell lymphoma (SBL), EMZL, DLBCL, and other subtypes to ensure the robustness of the finding ( Supplementary Fig. S3C). High concordance was significantly associated with distant recurrence events in SBL (p = 0.0059) and EMZL (p = 0.039). Despite the low incidence of recurrence and limited number of patients, the median value of patients with distant recurrence was still higher than that of patients without the events in DLBCL and other subtypes. These data demonstrated that the high global protein-mRNA concordance was a predictive factor for distant recurrence in OABL.
(See figure on next page.) Fig. 2 The match-subject analysis identifies global protein-mRNA concordance as an OABL-associated characteristic. A Schematic diagram of global protein-mRNA concordance calculation. B Density plot showing the global Spearman correlation for protein-mRNA pairs within OABLs (n = 3818 protein-mRNA pairs) and controls (n = 3728 pairs). C Concordance of protein-mRNA pairs is significantly higher in OABLs compared with the control or normal group and relatively higher compared with the RLP or IOI group. D Global protein-mRNA concordance is associated with prognostic risk factors. No-EMZL subtype, high LDH, and high IPI score have an increased concordance. E Global protein-mRNA concordance is positively correlated with the MKI67 proteomic abundance in the OABLs (r = 0.495, p = 0.005) but not in the controls (r = 0.203, p = 0.527). Blue line shows liner regression. F High global protein-mRNA concordance is associated with distant recurrence in OABL. G Kaplan-Meier plot shows high global concordance in OABLs is associated with decreased distant recurrence-free survival. H Bar plot of top 20 gene sets identified by GSVA correlated with global protein-mRNA concordance Next, we investigated the potential regulators and biological implications of the abnormally upregulated global protein-mRNA concordance. Because the global concordance is an intrinsic continuous variable, we examined the correlation between GSVA results and the concordance (Fig. 2H). In the top 20 positively correlated genesets, 8 gene sets were TP53-related gene sets, and the others were mostly immune-related gene sets. ECM-associated gene sets accounted for the majority of negatively correlated gene sets.
These findings indicate that increased global protein-mRNA concordance is a novel molecular characteristic of OABL that is associated with disease aggressiveness and higher risk of recurrence. This abnormally upregulated concordance in OABL is positively related to the TP53 pathway.

Trend analyses identify alternative splicing as an inflammation-independent signature of OABL
In the proteotranscriptomic data, we observed a similarity of molecular characteristics between inflammation and OABL samples through hierarchical clustering, principle component analysis, and global protein-mRNA concordance (Fig. 2C, Fig. 3A-B, Supplementary Fig. 1). As previous studies demonstrated the activation of NFκB signaling pathway in both inflammation and NHL [25,26], we performed hierarchical clustering in the NFκB signaling pathway across subgroups (Fig. 3C). The abundance of NFκB-related genes progressively increased from normals to inflammations, and to OABLs, which was consistent with the previous reports [25,26]. However, issues remained as: what extent the similarity is; which pathways discriminate OABL from inflammation; and whether these pathways are driver events of OABL.
To address these questions, we constructed a robust inflammation-OABL signature in proteomic data by supervised and unsupervised clustering genes across the normal, inflammation, and OABL groups. First, we performed the t-test of protein abundance between each two groups and hypothesized nine dysregulated patterns of genes (Fig. 3F, Supplementary Methods). Most were constituted by the upregulated patterns (cluster3u 24.6%; cluster2u, 21.94%; cluster4u, 11.68%). Interestingly, in these upregulated patterns, inflammations could not be discriminated from OABLs in a majority of genes (906/2144, 42.3%). We additionally performed an unsupervised k-means clustering for normalized HVGs (genes with top 50% MAD, k = 4) (Fig. 3D-E). Cluster numbers were determined by the elbow plot ( Supplementary Fig.  S4A).
Combining the results of k-means clustering and t-test gene patterns, we identified five clusters of genes: inflammation mimic upregulated genes (MIMIC-UP), vaguely upregulated genes (VAGUE-UP), OABL specific upregulated genes (SPECIFIC-UP), inflammation mimics downregulated genes (MIMIC-DOWN), and OABL specific downregulated genes (SPECIFIC-DOWN) (Fig. 3G,  Supplementary Table S7). Because upregulated genes constituted most of the clustered proteins, we focused on MIMIC-UP and VAGUE-UP, which represented extremely different patterns of dysregulation. MIMIC-UPs were mostly enriched in immune-related gene sets, while SPECIFIC-UPs were mostly enriched in gene sets that related to mRNA metabolism and splicing, DNA damage and metabolism, and chromatin remodeling (Fig. 3H).
These results clearly demonstrated that the similarity between inflammation and OABL is not only in the NFκB pathway but also in a larger immune landscape. More importantly, we identified gene sets specifically dysregulated in OABL, including mRNA splicing and well-known pathways associated with malignancy development (DNA damage, chromatin remodeling).

Alternative splicing and its regulators potentially influence OABL development and progression
Alternative splicing plays an important role in OABL. Our findings revealed dysregulated alternative splicing in an inflammation-independent pattern in OABLs (Fig. 1E,  Fig. 3H). The mRNA splicing geneset was upregulated in all subtypes of OABL ( Supplementary Fig. S5A). We further investigated the enriched motif/domain of OABL in the proteome, and the RNA recognition motif was the most significantly enriched term ( Supplementary Fig.  S5B). Alternative splicing was reported to be associated with malignancy development and progression [29,30]. Therefore, we hypothesized that alternative splicing may play an oncogenic role in OABL.
We therefore constructed a workflow to 1) evaluate raw RNA-sequencing data and identify AASEs of OABLs compared with controls; 2) syndicate clinical data to identify prognostic-related AASEs; and 3) combine proteomic data to investigate potential splicing regulators and biological function of AASEs (Fig. 4A). We analyzed five types of ASEs: alternative 3′ splice sites (A3), alternative 5′ splice sites (A5), mutually exclusive exons (MX), retained introns (RI), and skipping exons (SE). A total of 1806 AASEs were identified (Supplementary Table S8), and most were SE. These AASEs affected 916 genes in total, and most of them were affected by SE (Fig. 4C). Among the 916 AASE related genes, 651 genes were only modulated by one type of ASE, while the rest were affected by several types of ASE (Fig. 4B). Interestingly, 60 of 134 MX related genes were also affected by SE. Using univariate cox regression analysis, we identified 91 progression-related AASEs (64 affected genes), including 59 SE, 15 MX, 8 RI, 7 A5, and 2 A3 (Fig. 4G, Supplementary Table S10).
Next, we investigated the biological implication of the AASEs. The top listed enrichment terms of the AASE related genes were associated with the gene sets related to cytoskeleton, Rho GTPase, and organelle (Fig. 4D). Noticeably, Rho GTPase and cytoskeleton pathway were also identified in the enrichment analysis of CO-UP and CO-DOWN DEGs (Fig. 1C). We then analyzed the potential implicated biological function through correlating AASE Inclevel with GSVA results from proteomic data (Fig. 4E, Supplementary  Methods). Among the top 20 listed genesets, 8 were related to DNA damage and cell cycle. Others most included steroid hormone-related and organic acidrelated gene sets.
We then investigated the potential regulators of AASEs. We correlated the protein abundance of annotated SFs with Inclevel of AASEs (Fig. 4F, Supplementary Methods) and progression-related AASE (Fig. 4H). Among the top 20 listed SFs, 13 were recurrently identified in both correlations. We searched these 13 genes in 4 DLBCL cohorts in cbioportal (http:// cbiop ortal. org) [31]. While 4 genes exhibited no genomic events. While a total of 10% (30/300) patients had genomic events in the other 9 genes (Fig.  S5C).
Together these data indicate that AASEs are widely present in OABLs and associated with prognosis. Genes that exhibited alternative splicing were associated with dysregulated gene sets in OABLs. AASEs were associated with key biological functions, like DNA damage and cell cycle, which might imply that they function as post-transcriptional regulators of these process. Some DLBCL patients had genomic events in AASEs that were highly correlated splicing factors and regulators, which also suggested that alternative splicing was a driver event of OABL.

ADAR is a core regulator of alternative splicing in OABL and influences key biological functions
ADAR, a member of the adenosine deaminases acting on the RNA family of enzymes, catalyzes the editing of adenosine to inosine in double-stranded RNA. ADAR was recently reported to regulate alternative splicing independent of editing ability [32]. In correlation analyses, we found that ADAR was a top-listed SF recurrently associated with AASEs and progression-related AASEs (Fig. 4F, H). Among the SFs recurrently associated with AASEs, ADAR showed the highest incidence of genomic events (3%, 9 of 300 patients).
We next investigated the potential biological functions associated with ADAR and found that, the protein abundance of ADAR was strongly correlated with 280 AASEs (|r| > 0.6), which affected 145 genes. These genes were mostly enriched in Rho GTPase-related gene sets (Fig. 5A), an important geneset and critical transducer of intracellular signaling in tumor initiation and progression [33,34].
To examine the role of ADAR, we identified AASE in 11 groups of ADAR knockdown (KD), knockout, or overexpression cancer cell lines (Supplementary Methods). A total of 1472 genes were recurrently affected by ADAR regulated AASE (Supplementary Table S11). These genes were also enriched in Rho GTPase-related gene sets (Fig. 5B). Because the affected genes were enriched in apoptotic and proliferation related gene sets (Fig. 5A-B), we investigated the relationship between ADAR and MKI67. In our data, ADAR protein abundance was positively correlated with MKI67 (r = 0.477, p = 0.002, Fig. 5C). Among 33 tumor types in TCGA databases, 29 exhibited a significantly positive correlation with ADAR expression (p < 0.05, r > 0) and 14 exhibited a strong correlation with ADAR expression (p < 0.05, r > 0.4) (Fig. 5D).
To verify the hypothetic regulator role, we established ADAR KD NHL cell lines (Fig. S6A). ADAR KD cell lines exhibited significantly decreased cell proliferation compared with the control cells (Fig. 5E). We further found that ADAR KD sensitized NHL cell lines to Rho GTPase inhibitors. The IC50 of the Rho GTPase inhibitor MLS000532223 after 72 h of treatment was lower in ADAR KD cell lines compared with controls (Fig. 5F). ADAR KD cell lines also exhibited increased sensitivity to a Rho-kinase inhibitor (HA110 HCL). These results demonstrated that ADAR, a core regulator of alternative splicing in OABL, regulated cell proliferation and sensitivity to Rho GTPase inhibitors.

Proteomic analysis identifies DNAJC9 as a diagnostic marker of EMZL
Pathological diagnosis of OABL remains difficult. Therefore, we examined our proteomic data to investigate a potential diagnostic marker for OABL. A workflow of biomarker detection is shown in Fig. 6A. To screen biomarkers, we first identified 98 differentially expressed proteins and 98 significant proteins through univariate logistic regression. Next, 85 overlapped proteins were included into the lasso penalty regression model, and the analysis yielded four proteins (DNAJC9, TFEB, SUMO3 and MBD1) through 200 iterations of cross-validation. We then performed stepwise logistic regression for these proteins, and DNAJC9 was the only protein identified. To verify the result, we analyzed the protein abundance of DNAJC9 across groups (Fig. 6B). DNAJC9 abundance was significantly higher in OABLs compared with control, IOI, and RLH groups. DNAJC9 abundance was also significantly higher in all subtypes of OABL compared with controls. Compared with the traditional diagnostic markers CD20 and PAX5, DNAJC9 exhibited a relatively higher AUC value for OABLs (Fig. 6C).
Next, we investigated the diagnostic performance of DNAJC9 in patient FFPE samples. EMZL and DLBCL (the two main subtypes of OABL) were chosen as the experimental group, and inflammation samples were used as the control; paracancer sites of OABLs were additionally counted as the control in IF analysis. Analysis of subcellular localization revealed that DNAJC9 was localized in the nucleus in OABLs and lymphoid regions of inflammation tissues. In paracancer sites of OABL and gland regions of inflammation, DNAJC9 was localized in the cytoplasm. Most DNAJC9 was co-expressed with CD20 in the same cell in lymphoid regions and OABLs ( Fig. 6D-E). The MFI of DNAJC9 was significantly higher in EMZLs and DLBCLs compared with controls. The MFI of CD20 was not higher in OABLs compared with inflammations (Fig. 6E). IHC analyses showed DNAJC9 staining score of bulk cells and nuclei were both significantly higher in EMZLs compared with inflammations ( Fig. 6F-G). For DLBCLs, the staining score of bulk cells in was not significantly higher compared with inflammations, and the staining score of nuclei was significantly lower compared with EMZLs. These results demonstrated that DNAJC9, especially strong nuclear staining of DNAJC9, is a promising pathological diagnostic marker of EMZL that can differentiate EMZL from inflammation.

Discussion
OABL is a rare subtype of NHL and a common type of malignancy in the orbital region [1,2,35]. To date, only a few studies have examined the gene expression profile of OABL [7]. Although there are several transcriptomic studies of NHLs, their proteomics and integrated molecular characteristics have been poorly understood [7][8][9]. Our study reports proteotranscriptomic data of OABL for the first time. We performed integrated quantitative proteome and transcriptome analyses, that allowed us to: 1) identify robust dysregulated genes and pathways; 2) gain insights into post-transcriptional expression regulation; and 3) investigate novel disease characteristics of OABL. Together, our findings provide novel insights into the molecular landscape of OABL and identify a promising diagnostic biomarker.
In our data, proteome described disease pathways and DEGs are partially captured by the transcriptome. Different distribution between transcriptomic and proteomic data was previously observed [10]. Thus, we additionally performed GSEA, a rank-based algorithm, to investigate the dysregulated pathways in OABL. The robustly concordant dysregulations are mostly consistent with previous observations in mature B-cell lymphomas [9]. Post-transcriptional regulation mechanisms of gene expression, including protein sumoylation, RNA m6A modification, and alternative polyadenylation, are current focuses in cancer research [36,37]. These processes can result in different expression patterns between protein and mRNA. We assumed that by computing the correlation between transcript and protein abundance, the global protein-mRNA concordance can imply the level of impact of post-transcriptional regulation mechanisms for each patient. Consistent with previous observations in breast cancer, our findings showed that high concordance is a disease-specific characteristic in OABL and associated with poor prognosis [11]. These results suggest that traditional translational regulation of gene expression and expression-independent post-transcriptional modification play a major role in malignancy development.
The similarity and association between inflammation and NHLs are frequently reported, but the extent and underlying mechanisms of this relationship have not been thoroughly studied [25][26][27][28]. Herein, we observed a similar situation [25,26]. We therefore constructed a robust inflammation-OABL signature. NFκB pathway is previously reported upregulated in both inflammation and NHL. In our data, the gene expression profiles of inflammation and OABL are similar and broadly affects immune-related genes, which confirms and extends the scope of this similarity. Echoing our previous assumption, alternative splicing as an expression-independent post-transcriptional regulation mechanism is specifically dysregulated in OABL in an inflammation-independent manner.
Alternative splicing is a post-transcription regulator of pre-mRNA and allows the generation of multiple splice isoforms from genes that can exhibit distinct functions [12]. Numerous studies have demonstrated the oncogenic role of alternative splicing in cancers [29,30]. Though components of the spliceosome are recurrently mutated in hematologic malignancies [14], including the SF3B1 mutation present in approximately 10% of chronic lymphocytic leukemia and DDX41 mutation in follicular lymphoma and Hodgkin lymphoma, the biological function and oncogenic potential of alternative splicing have not been well studied [38]. From our results on specific dysregulated gene expression and enriched RNA binding motif, we speculate that alternative splicing is a potential oncogenic event in OABL. By constructing an AASE landscape, we found that AASEs are highly correlated with important biological functions, and some AASEs predict the progression of OABL. Our analysis further identified ADAR as a core SF of AASEs. ADAR is recurrently and highly correlated with all AASEs and prognostic-related AASEs and mutated in DLBCL patients. ADAR directly edits and splices RNA and promotes malignancy development and progression [32,[39][40][41][42], but its oncogenic role in NHL has not been demonstrated. High correlated AASE affected genes in the OABL and ADAR regulated AASE affected genes identified using publicly available datasets were both enriched in Rho GTPase and cell proliferation pathways. We further showed that ADAR regulates cell proliferation and sensitivity to Rho GTPase inhibitors in NHL cell lines. Together, our findings indicate that alternative splicing is an inflammation-independent oncogenic event of OABL, and dysregulation of the splicing regulator ADAR may result in malignancy development and progression.
A major issue in clinical practice is the efficient clinical or pathological differential diagnosis between OABL (especially EMZL) and inflammation [43][44][45][46], which is critical because of the different therapeutic approaches for these diseases. We constructed a proteome-based workflow and identified DNAJC9 as a potential diagnostic marker of OABL. DNAJC9 is a heat shock protein family member that is a histone co-chaperone and a p53-target gene [47,48]. In inflammation and OABL tissues, DNAJC9 was co-expressed with CD20 and predominantly localized in the nucleus. Our study demonstrates that nuclear staining of DNAJC9 is a promising pathology diagnostic biomarker of EMZL, which may provide important benefits in clinical practice.

Conclusions
OABL is a rare subtype of non-Hodgkin lymphoma, and its molecular characteristic is poorly understood. We performed an integrated study to investigate the proteotranscriptome landscape of OABL. We found that alternative splicing may be the biological foundation for malignancy development. Furthermore, ADAR, a core SF, regulates the proliferation and Rho GTPase inhibitor sensitivity of NHL cell lines. OABL is characterized by high global protein-mRNA concordance, which is a novel recurrence-related characteristic. This study also identified the strong nuclear staining of DNAJC9 as a promising pathology diagnostic biomarker of EMZL. Our results provide insights into the biology of OABL and pave the way for clinical practice and further study of OABL.